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RANDOM  VARIATE  GENERATION:  A  SURVEY1 


Bruce  W.  Schmelser 
School  of  Industrial  Engineering 
Purdue  University 
West  Lafayette,  IN  47907 


ABSTRACT:  The  state  of  the  art  of  generating  random  vacates  on  a  digital 
computer  Is  surveyed.  General  concepts  are  presented,  followed  by  criteria 
for  comparing  algorithms.  The  literature  Is  surveyed  for  continuous  univariate, 
discrete  univariate,  continuous  multivariate,  and  discrete  multivariate  distribu¬ 
tions,  as  well  as  for  point  processes,  time  series,  order  statistics  ar.d 
geometrically  inspired  problems.  An  extensive  list  of  references  is  provided. 


1.  INTRODUCTION 


I  Assuming  the  existence  of  a  source  of  independent  'J(0,  1)  observations  u, ,  u-,  we  survey  the  state  i 

j  of  the  art  of  transforming  the  uniform  random  numbers  to  obtain  random  variates  x. ,  x«,  satisfying 

specified  properties  of  distribution  and/or  dependency  structure,  for  use  as  Inputs  to  stochastic 

simulation  experiments  on  digital  computers.  | 

Ue  assume  the  U(0,  1)  random  variables  are  ideal;  that  Is,  they  are  exactly  uniformly  distributed  over  '] 

the  Interval  (0,  1)  and  they  are  Independent.  The  consequences  of  this  assumption  not  being  entirely  ■ 

true  are  discussed  In  3urford  and  '.Jlllis  (1978),  Chay,  Fardo  and  (iazumdar  (1975),  Bolder  and  Settle  j 

(1976),  Honahan  (1973)  and  Heave  (1973).  Kennedy  and  Gentle  (1980)  provide  an  excellent  and  up-to-date  j 

discussion  of  U(0,  1)  generation. 

’  j  | 

f  Note  that  It  Is  possible,  although  very  uncommon,  to  use  distributions  other  than  U( 0,1 )  as  the  basic  j 

source  of  randomness.  Liinow  (1974),  for  example,  discusses  using  truely  random  Poisson  observations.  ; 

9  I  : 

(  Ue  discuss  general  underlying  concepts  in  Section  2  and  criteria  for  comparing  variate  generation  j 

v  algorithms  In  Section  3.  Section  4,  which  surveys  the  state  of  the  art  of  specific  problems,  considers  ; 

both  continuous  and  discrete  random  variables  and  random  vectors,  as  well  as  processes  correlated  and  1 

*  changing  over  time,  order  statistics,  and  geometrically  inspired  problems  such  as  generation  of  points 

uniformly  distributed  on  the  surface  of  a  sphere  and  random  permutations. 

I  '  For  completeness  there  are  a  few  references  listed  at  the  end  of  the  paper  which  are  not  discussed. 


2.  FUNDAMENTAL  CONCEPTS 


It  is  important  to  distinguish  between  the  fundamental  approaches  for  random  variate  generation  and  the 
resulting  algorithms.  Uhlle  occasionally  an  efficient  algorithm  results- from  the  direct  application  of 
a  single  concept,  more  often  an  algorithm  Is  a  combination  of  more  than  one  concept.  As  in  other  fields, 
such  as  mathematical  programming,  the  same  concepts  applied  In  much  the  same  way  can  still  lead  to  dif¬ 
ferent  algorithms  due  to  changes  in  data  structure  and  tailoring  to  specific  computer  efficiencies. 

He  discuss  four  fundamental  concepts:  (1)  inverse  transformation,  (2)  composition,  (3)  acceptance/ 
rejection,  and  (4)  special  properties.  Unlike  the  algorithms  discussed  In  Section  4,  these  concepts 
have  changed  only  little  since  variate  generation  was  first  studied.  Butler  (1956),  for  example, 
discusses  these  concepts.  Kennedy  and  Gentle  (1980)  discuss  both  basic  concepts  and  algorithms  and 
provide  an  extensive  recent  bibliography.  Other  general  references  include  Ahrens  and  Dieter  (1974b), 


1  Support  from  the  Office  of  Naval  Research  under  contract  N00014-79-C-0832  is  gratefully  acknowledged. 
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Chambers  (1970),  De4k  and  Bene  (1979),  Fishman  (1973,  1978b),  Galliher  (1959),  Handbook  of  Mathematical 
Functions  (1964),  Hamnersley  and  Handscomb  (1964),  Halton  (1970),  Jansson  (1966),  Kahn  (1964),  Knuth 
(1969),  Knuth  and  Yao  (1976),  P.A.W.  Lewis  (1972,  1979),  P.A.W.  Lewis  and  Learmonth  (1973),  T.G.  Lewis 
(1975),  McGrath  and  Irv<ng  (1973),  Newman  and  Odell  (1971),  Sowey  (1972,  1978),  Spanier  and  Gebhard 
(1969).  Teichrow  (1953,  1965),  and  Tocher  (1963). 


2.1  The  Inverse  Transformation 

The  use  of  the  inverse  of  the  cumulative  distribution  function  (cdf)  leads  to  the  most  fundamental 
method  for  generating  random  variates.  It  is  applicable  to  any  univariate  distribution,  whether 
discrete,  continuous,  or  mixed.  The  method. is  to  convert  the  U(0,1)  random  number  u  to  the  value  x 
lying  at  the  u  th  fractle;  that  is,  x  «  F" 1 ( u ) ,  which  is  analogous  to  a  percentile  test  score  of  u 
(or  lOOu)  with  a  corresiiondl ng  raw  score  of  x. 

First  consider  an  arbitrary  discrete  distribution  with  cdf  F(x).  The  probability  of  observing  x.  is 
F(xi+i)  -  F(xj)  and  any  method  of  assigning  this  probability  to  x^  is  a  valid  method.  However,  'the 

most  straightforward  procedure  is  to  return  x^  if  and  only  if  F(x-)  <  u  <  F(xi+1).  Some  care  must  be 

taken  on  the  end  points  to  be  sure  all  values  are  defined  and  to  avoid  round-off  error,  but  otherwise 
implementation  is  direct:  (1)  Generate  u  n,  U(0,1)  and  set  i*0,  (2)  set  1=i+1,  (3)  if  u  >  F^,  go  to  2, 

(4)  otherwise  return  x=Vj.  Here  two  vectors  are  needed:  F^  to  store  the  value  of  F(Xj)  and  v^  to  store 
the  value  of  x^,  1*1,  2,  ...,  n.  For  many  discrete  distributions,  the  explicit  use  of  F^  can  be  avoided 
since  a  recursive  relationship  can  be  used  to  calculate  F.  from  F^  . .  (For example,  see  the  discussion 

of  the  Poisson,  binomial,  and  negative  binomial  distributions  below,  as  well  as  the  geometric  distribu¬ 
tion  which  has  a  closed  form  inverse  transformation.)  Likewise,  the  vector  v  can  be  made  implicit  when 
simple  relationships  exist  between  v..  and  i,  such  as  v^i  or  v^=i-l.  Chen  and  Asau  (1974)  suggest  the 

use  of  index  tables  to  speed  the  search  for  the  proper  interval. 

A  similar  concept  applies  to  continuous  distributions,  where  now  we  want  P(a  <  x  <  b)  *  F(b)  -  F(a)  for 
all  values  of  a  and  b.  This  property  is  satisfied  when  x*  F-i(u)  is  used,  since  the  distribution  of 
the  random  variable  Y=F(X)  is  U(0,  1)  for  any  continuous  random  variable  X.  The  continuous  version  can 
also  be  obtained  by  considering  the  limiting  case  of  the  discrete  concept  as  the  intervals  xi+,  -  x., 
become  shorter.  1 

For  some  distributions  the  inverse  transformation  leads  to  closed  form  algorithms  which  may  be  implemen¬ 
ted  directly.  Examples  are  x  =  a  +  (b-a)u  for  X  ^  U(a,  b)  and  x  =  -(ln(l-u)/a)**(l/y)  for  the  Weibull 
distribution  with  shape  parameter  y  and  scale  parameter  a.  Note  that  y*l  yields  the  exponential 
distribution  with  mean  1/a. 

Numerical  methods  may  be  used  when  the  inverse  transformation  is  not  closed  form.  Butler  (1970) 
discusses  a  general,  although  approximate,  method  for  generating  random  variates  from  any  continuous 
distribution  via  numerical  integration  of  the  density  function.  (See  corrections  by  Proll  (1972).) 
Numerical  methods  for  the  normal,  gamma,  and  beta  distributions  are  referenced  in  Section  4.  When 
the  distribution  is  in  the  form  of  a  histogram  (a  mixture  of  uniform  distributions),  Barnard  and  Cawdery 
(1974)  suggest  using  an  approximate  but  fast  algorithm  based  on  approximating  the  distribution  with 
equally  likely  uniform  distributions  and  linear  interpolation. 

In  both  the  discrete  and  continuous  cases,  there  are  several  reasons  for  using  the  inverse  transformation 
even  if  slow  numerical  techniques  are  involved:  (1)  Order  statistics  can  be  easily  generated,  as  dis¬ 
cussed  in  Section  4,  (2)  truncated  distributions  may  be  generated  using  x*F“'(u')  where  u'*a+(b-a)u, 
resulting  in  F*‘(a)  <  x  <  F-1 (b) ,  (3)  the  use  of  variance  reduction  techniques  is  aided,  as  discussed  in 
Section  3. 


2.2  Composition 

Composition,  or  probability  mixing,  is  often  used  without  realizing  the  generality  of  the  method.  For 
example,  the  double  exponential  (LaPlace)  distribution  is  commonly  generated  by  obtaining  a  negative 
exponential  random  variate  and  assigning  a  random  sign.  Another  example  is  mixed  distributions,  such  as 
rainfall  in  a  particular  week,  where  zero  rainfall  may  occur  with  probability  p  and  the  amount  of 
rainfall,  conditional  on  there  being  some,  may  follow  a  gamma  distribution.  The  algorithm  is  to  set  x*0 
if  u  <  p  and  to  generate  a  gatima  variate  x  otherwise.  However,  composition  is  useful,  in  many  situations 
where  thi  concept  is  not  so  intuitively  applied. 

Composition,  like  the  inverse  transformation,  has  both  a  discrete  and  continuous  fdrm.  However,  the  type 
of  composition  is  independent  of  the  type  of  random  variable;  discrete  random  variables  can  be  mixed 
continuously  and  vice  versa.  We  first  consider  discrete  mixing. 

Let  f(x)  denote  the  density  function  If  X  1$  a  continuous  random  variable  or  the  probability  of  observing 
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x  If  X  Is  discrete.  Then  discrete  composition  Is  applicable  when  f(x)  Is  written  as 
f(x)  -  I  Pi  f,(x) 

where  £  p^  ■  1  and  n  may  be  Infinite.  The  generation  of  random  variates  from  f(x)  simply  requires 
generating  a  variate  x  from  f^(x)  with  probability  p^.  The  selection  of  1  Is  usually  via  the  discrete 
Inverse  transformation  and  the  generation  from  f^(x)  may  use  imy  algorithm.  In  the  double  exponential 
example,  f^x)  *  X  exp(-Ax)  I(x)(Q  and  fg(x)  =  X  exp(Xx)  I(x)j_m  0j,  where  I(x)^a  bj*l  If  a  <  x  <  b 

and  zero  otherwise,  and  p1  *  p?  =  .5.  In  the  rainfall  example,  p1  =  Pz>  P2  *  1  -  p2,  t'j(x)  «  I(x)^0 

and  f-(x)  Is  the  gamma  density  function. 

c  n 

Note  that  the  linear  combination  of  random  variables  X  *  Z  a.  X^  is  a  convolution  and  the  proper 

generation  procedure  Is  to  generate  each  of  the  n  random1 vlriates  x.  and  to  combine  them  as  Indicated 
In  the  linear  combination.  Do  not  confuse  convolution  and  composition. 

Discrete  compos It Ion  has  an  Intuitive  geometric  interpretation  in  terms  of  the  density  function  f(x), 
in  that  the  area  under  the  density  may  be  partitioned  in  any  way  to  form  the  n  subdensities  f,(x). 

In  the  case  of  the  double  exponential,  the  partition  between  che  two  subdensities  Is  vertical.1  A 
horizontal  partition  may  be  used  to  partition  a  trapezoidal  shaped  density  function  into  a  uniform 
(rectangular)  subdensity  and  a  triangular  subdensity.  The  ar-*a  of  each  subdensity  f^(x)  Is  Pj . 

Many  of  the  fastest  algorithms  for  uni  variate  continuous  distributions  use  discrete  composition.  See 
Marsaglia  (1961c)  who  discusses  the  concept  and  applies  it  to  the  normal  distribution.  Other 
applications  are  discussed  in  Section  4. 

One  of  the  most  Important  advances  In  the  generation  of  discrete  random  variables  Is  due  to  Walker 
(1974a,  1974b,  and  1977),  who  describes  the  concept  of  "aliasing"  for  distributions  having  a  finite 
number  of  possible  values.  Walker  noted  that  any  discrete  distribution  having  a  finite  number  of 
outcomes  can  be  expressed  as  a  mixture  of  n  distributions  each  having  exactly  two  outcomes  and  each 
having  coefficient  p^  *  1/n.  This  yields  the  very  fast  discrete  composition  algorithm  (1)  Generate 

u  n,  1/(0,  1),  set  u  =  u  n.  set  1  *  INT(u)  +  1,  set  u  =  1  -  u,  (2)  if  u  <  F<t  return  x  »  1,  (3)  otherwise 

return  x  *  Ait  where  it  is  assumed  that  x  •  1,  2,  ....  n  are  the  possible  values  of  x.  Here  1  has  a 
discrete  uniform  distribution  over  the  range  1,2 . n;  is  the  probability  that  x=1  and  1-F^  is 

the  probability  that  the  alias  value  x=A.  is  returned.  Kronmal  and  Peterson  (1979)  discuss  the  calcu¬ 
lation  of  F.  and  A.  for  1=1,  2,  ....  n  and  also  prove  that  the  method  is  applicable  for  all  distributions 
with  range  x  *  1,2,  ....  n.  Of  course  the  use  of  an  additional  vector  analagous  to  v  in  the  discrete 
inverse  transformation  allows  generation  from  any  discrete  distribution  with  a  finite  number  of  outcomes. 

Tabling  discrete  values,  which  results  in  very  fast  algorithms  at  the  expense  of  rounding  the  probab¬ 
ilities  and/or  using  large  tables,  is  a  composition  method.  Marsaglia  (1963)  discusses  an  ingenious 
modification  to  reduce  the  table  size.  See  also  Norman  and  Cannon  (1972). 

The  continuous  composition  algorithm  can  be  used  when  f(x)  is  expressed  as  f(x)  *  /“  fX|y(x)  dFy(y), 

where  Y  is  a  continuous  random  variable  mixing  conditional  density  functions  or  discrete  mass  functions 
fX|y(x).  Variate  generation  proceeds  in  two  steps:  (1)  Generate  a  continuous  random  variate  y  having 

cdf  Fy(y)  and  (2)  generate  a  random  variate  x  from  fX|y(x)-  Distributions  which  can  be  handled  in  this 

way  are  called  compound  distributions.  Examples  include  the  beta-binomial,  where  the  probability  of 
success  p  in  the  binomial  distribution  is  a  random  variable  with  a  beta  distribution.  Less  intuitive  is 
that  a  Pearson  type  IV  distribution  can  be  generated  as  a  gamma (n,  1/e)  with  6  being  a  gamma(fi,  y)  random 
variate,  where  gamma(a,  b)  denotes  the  gamma  distribution  with  shape  parameter  a,  scale  parameter  b,  and 
mean  ab.  Another  example  is  the  negative  binomial  discussed  below.  For  other  examples  of  compound 
distributions,  see  Johnson  and  Kotz  (1969). 

Note  that  since  a  x*  random  variable  is  the  square  of  a  standardized  normal  random  variable,  it  is  not 
unreasonable  to  consider  generating  a  normal  variate  using  a  xz  variate.  The  problem  arises  when  it  is 
noted  that  either  of  the  two  roots  of  the  xz  variate  corresponds  to  normal  variates.  Due  to  symmetry,  it 
.seems  reasonable  to  use  each  root  with  probability  .5,  which  is  correct.  Michael,  Schucany  and  Haas 
(1976)  derive  the  correct  multinomial  probabilities  for  selecting  one  of  multiple  roots,  leading  to  a 
simple  composition  algorithm  for  the  inverse  Gaussian  distribution,  as  an  example. 
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2.3  Acceptance/Rejection 


The  acceptance/ reject ion  concept  is  to  generate  variates  from  one  distribution  and  discard  (reject) 
some  of  them  in  such  a  way  that  the  remaining  variates  have  the  desired  distribution.  Although  until 
the  last  few  years  the  acceptance/rejection  concept  has  been  used  almost  exclusively  with  univariate 
continuous  distributions,  it  Is  valid  for  either  discrete  or  continuous  and  univariate  or  multivariate 
distributions. 

Let  f(x)  denote  the  density  function  of  X  if  X  is  a  continuous  random  variable  or  the  mass  function  'f 
X  Is  a  discrete  random  variable.  Here  X  may  be  either  univariate  or  multivariate.  Let  t(x)  be  any 
majorizing  function  of  f(x);  that  is,  we  require  that  t(x)  >  f(x)  for  all  values  of  x.  Let  g(x)  *  t(x)/c 
denote  the  density  function  proportional  to  t(x)  If  X  is  continuous  (in  which  case  c  *  t(x)  dx)  or 
the  mass  function  proportional  to  t(x)  if  X  is  discrete  (in  which  case  c  =  £x  t(xj).  Tfie  algorithm  is 

(1)  generate  x  n.  g(x),  (2)  generate  u  ^  U(0,  1),  (3)  if  u  >  f(x)/t(x),  then  go  to  step  1,  (A)  otherwise 
return  x. 

The  algorithm's  execution  time  depends  on  three  factors:  (1)  The  time  to  generate  x  in  step  1,  (2)  the 
time  to  perform  the  comparison  in  step  3,  and  (3)  the  expected  number  of  interations,  c,  to  return  x. 

The  selection  of  the  majorizing  function  t(x)  plays  a  major  role  in  all  three  factors,  making  it  crucial 
to  the  development  of  efficient  algorithms.  In  elementary  textbook  discussions  of  the  acceptance/ 
rejection  algorithm,  t(x)  -  max  f(x)  Is  usually  used,  as  originally  discussed  by  von  Neumann  (1951). 

Step  1  is  then  to  generate  a  uniform  variate  over  the  range  of  X, which  is  fast,  but  the  expected  number 
of  Iterations,  c,  is  often  unacceptably  large,  such  as  for  the  beta  distribution  over  the  interval  (C ,  1) 
as  the  shape  parameters  p  and/or  q  become  large,  as  discussed  in  detail  in  Section  4.  Many  recent 
algorithms  use  acceptance/rejection. 

The  basic  concept  can  be  made  more  efficient  by  adding  some  logic  between  steps  2  and  3.  Since  step  3 
often  requires  slow  exponential  type  operations,  preliminary  comparisons  using  simple  one-sided 
approximations  to  f(x)/t(x)  can  speed  up  an  algorithm  by  accepting  or  rejecting  x  before  f(x)/t(x)  is 
calculated.  This  modification  has  been  termed- the  "squeeze"  method  by  Marsaglla  (1978).  Marsaglia 
(1970)  discusses  one-sided  approximations. 

It  is  also  common  to  apply  two  "tricks"  to  step  3.  First,  f(x)  is  rescaled  to  avoid  having  to  calculate 
normalizing  constants  which  tend  to  Involve  hard  to  compute  constants  such  as  gamma  and  beta  functions. 
Since  the  shape  of  the  density  function  does  not  depend  on  these  normalizing  constants,  other  constants 
can  be  substituted.  Setting  the  normalizing  constant  to  1  sometimes  causes  numerical  problems,  however. 
Ahrens  and  Dieter  (1974)  rescale  the  gamma  distribution  so  that  max  f(x)  *  1,  thereby  avoiding  the  gamma 
function  as  well  as  numerical  problems.  The  second  "trick"  is  to  Compare  ln(u)  to  ln(f (x)/t(x) )  in 
step  3,  since  this  also  helps  to  avoid  numerical  problems,  often  eliminates  some  exponential  calculations, 
and  special  methods  exist  to  generate  ln(u)  directly  (as  the  negative  of  an  exponential  random  variate). 

Schmeiser  and  Lai  (1980),  Schmeiser  and  Babu  (1980)  and  Tadikamalla  (1978),  for  example,  use  acceptance/ 
rejection  to  generate  variates  from  subdensities  in  composition  algorithms.  Kronmal  and  Peterson  (1979b, 
1979c)  and  Kronmal,  Peterson  and  Lundberg  (1978)  combine  the  concepts  of  acceptance/rejection,  aliasing, 
and  discrete  composition.  Jeswani  and  Sikdar  (1978)  appear  to  have  recently  rediscovered  the  acceptance/ 
rejection  concept. 


2.4  Special  Properties 


Sometimes  the  distribution  from  which  random  variates  are  to  be  generated  has  one  or  more  special 
properties  which  can  be  used,  leading  to  methods  of  generation  which  are  specific  to  that  distribution. 
Three  topics  are  discussed  in  this  section:  transformations  from  nonunifohm  distributions,  generation  of 
trigonometric  functions  with  random  arguments,  and  von  Neumann's  comparison  method. 

Transformations  from  nonuniform  distributions 

Many  of  the  classical  methods  for  generating'  random  variates  from  cownon  distributions  are  based  on 
generating  some  intermediate  nonuniform  random  variates  y^ ,  y2,  ....  yn  and  then  calculating  the 

desired  variate  as  x  *  f(y1 ,  y2,  ....  yn).  For  n  ■  1,  examples  are  nonstandard  normal  via  x  *  u  ♦  oz, 

where  z  Is  a  standard  normal  variate;  U(a,  b)  via  x  *  a  +  (b-a)u;  and  lognormal  variates  via  x  *  exp(y), 
where  y  Is  the  appropriate  normal  variate. k  There  are  many  examples  for  n  >  1.  These  Include  Erlang  as 
the  sum  of  k  exponential  variates  (x*-  ln(n  uj),  beta  as  a  ratio  of  ganroas,  Student's  t  via  standardized 
normal  and  chi-square,  F  via  chi-squares,  chl’squares  via  normals,  binomial  as  a  sum  of  Bernoulli  trials, 
negative  binomial  as  a  sum  of  geometric  random  variates,  and  on  and  on.  Many  of  these  are  excellent 
approaches.  One  very • common  example  that  is  not  good,  because  it  is  a  rather  crude  approximation,  is 
the  approximation  of  the  normal  distribution  by  the  sum  of  twelve  uniform  variates,  x  *  u.+u,+- • •+u.2-6. 
The  kurtosis  is  2.9  rather  than  3  and  the  tails  are  truncated  at  +6.  If  a  very  simple  ginefation 
algorithm  is  needed,  such  as  when  using  a  hand  calculator,  an  easier  and  more  accurate  approximation  is 
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x  «  (u*135  -  (1-u),13’)/.»975.  as  discussed  In  Schraelser  (1980). 

•  An  important  special  case  of  transformations  from  Intermediate  random  variates  Is  the  ratlo-of-uniforms 
method  of  Kinderman  and  Monahan  (1976,  1977).  They  suggest  defining  a  region  R  so  that  conditional  on 
v  *  (v, ,  v.)  being  uniformly  distributed  over  R,  then  x  *  v,/v.  is  a  random  variate  from  the  distribution 
of  interest.  While  iny  method  may  be  used  to  generate  v,  cimmonly  two  dimensional  acceptance/rejectlon 
Is  used,  where  f ( v_)  «  1//R  dv  I(v)jrj  and  t(v)  *  1//R  dv^  I(v)^sj,  where  S  is  the  smallest  rectangle 

enclosing  R.  A  well-known  particular  example  Is  the  generation  of  Cauchy  variates  where  R  Is  the  unit 
circle. 

Trigonometric  functions  with  uniformly  distributed  arguments 

Some  standard  "tricks"  are  available  for  generating  random  values  of  trigonometric  functions  having 
uniformly  distributed  arguments.  They  are  suggested  for  two  dimensions  by  von  Neumann  (1951)  and 
extended  to  n  dimensions  by  Cook  (1957). 

The  problem  is  to  generate  values  of  sin(Y),  cos(Y)  and  tan(Y)  when  Y  U(0,  2ir).  There  is  no  conceptual 
problem  with  generating  tbe  Intermediate  random  variate  y  *  2nu  and  calculating  the  trigonometric 
function  directly,  but  the  following  method  is  faster  and  eliminates  the  need  for  the  subprogram  call. 

Let  (v,,  v2)  be  a  point  uniformly  distributed  over  the  unit  circle;  that  is,  f(vj,  v2)  =  1/*  If 
v^+Vj  <1  and  f(v^ ,  v2)  ■  0  otherwise.  Such  points  may  be  generated  using  the  two  dimensional  accep¬ 
tance  rejection  concept  discusses  immediately  above.  Let  a  denote  the  angle  between  the  positive  v,  axis 
and  the  vector  defined  by  the  origin  and  (v^,  v2).  Clearly,  a  n,  U(0,  2it).  Let  r  *  (V1Z+V22 )  ■ 

the  distance  of  (v^.  v«)  from  the  origin.  Then  cos(a)  =  v^/r,  sln(a)  *  v2/r,  and  tan(o)  =  v2/vj 

can  be  used  to  generate  the  trigonometric  functions.  Improvements  can  still  be  made  in  the  sin  and  cos 
which  involve  the  square  root  calculation  of  r.  Note  that  (1)  sin(a)  and  cos(a)  have  the  same  distribu¬ 
tion,  (2)  cos(a)  and  cos(2a)  have  the  same  distribution,  and  (3)  cos (2a)  *  cos2(a)  -  sin2(a),  which 

yields  (v2  -v1  )/(vj  +v2  )  as  random  values  for  either  sin(a)  or  cos(a). 

This  Idea  is  used  to  generate  Cauchy  random  variates  as  xxv?/v,  and  by  Knop  (1973)  for  the  dipole 
distribution.  The  polar  method  for  generating  normal  random  variates,  as  given  in  Marsaglia  and  Bray 
(1964)  is  also  based  on  these  concepts. 

von  Neumann's  comparison  method 

von  Neumann  (1951)  gave  a  method  for  generating  exponential  random  variates  which  involves  only  comparing 
uniform  random  numbers  and  no  exponential  level  calculations.  Forsythe  (1972)  extended  the  ideas,  based 
on  a  comment  at  the  end  of  von  Neumann's  paper,  to  the  normal  distribution  and  any  others  satisfying  the 
differential  equation  f'(x)  +  b(x)f(x)  =  0  for  0  <  x  <  ».  The  concept  has  been  used  by  Ahrens  and 
Dieter  (1973),  Dieter  and  Ahrens  (1973),  and  Brent  (1974)  and  extended  further  by  Monahan  (1979). 


3.  CRITERIA  FOR  ALGORITHM  COMPARISON 


Before  we  discuss  algorithms  for  specific  distributions,  we  list  here  some  criteria  which  are  useful  both 
when  developing  algorithms  and  when  selecting  an  algorithm  for  a  particular  situation. 

1 .  Accuracy 

1 .a  Theoretical 

l.b  Error  Induced  by  U{0,  1)  numbers  not  being  random 

1. c  Error  induced  by  computer  arithmetic  --  Monahan  (1977) 

2.  Execution  speed 

2.  a  Set-up  time  —  Apostolopoulos  and  Schuff  (1979) 

2. b  Marginal  execution  time  —  Greenwood  (1976) 

3.  Ease  of  Implementation 

3. a  Number  of  lines  of  code 

3.b  Support  routines  required 

3.c  Bit  manipulation  reouired  v 

4.  Portability  —  Greenwood  (1977) 

5.  Memory  requirements 

6.  Interaction  with  variance  reduction  techniques  —  Franta  (1975) 

This  list  Is  In  no  particular  order  of  importance.  In  fact,  an  Important  point  is  that  the  criteria  to 
be  used  differ  from  application  to  application,  making  it  impossible  to  order  criteria  In  order  of 
Importance.  This  makes  it  Impossible  to  select  a  "best"  algorithm  except  in  the  very  uncommon  case  where 
an  algorithm  Is  better  than  all  others  in  terms  of  every  criterion.  On  the  other  hand,  many  published 
algorithms  are  dominated  by  other  algorithms  in  that  there  is  no  situation  where  the  algorithm  Is  the  best 
choice.  However,  even  then,  a  poor  algorithm  may  be  the  best  selection  because  it  Is  already  Implemented. 


4.  STATE  OF  THE  ART 


Having  discussed  the  fundamental  concepts  for  generating  random  variates  In  Section  2  and  criteria  for 
evaluating  algorithms  in  Section  3,  we  now  discuss  the  state  of  the  art  In  each  of  several  specific 
areas:  continuous  univariate  distributions,  discrete  univariate  distributions,  continuous  multivariate 
distributions,  discrete  multivariate  distributions,  point  processes,  time  series,  order  statistics,  and 
geometric  problems. 


4.1  Continuous  Univariate  Distributions 

Without  a  doubt,  continuous  univariate  distributions  have  received  more  attention  In  the  literature  than  j 

any  of  the  other  topics  considered  here.  About  half  of  the  references  of  this  paper  fall  in  this  | 

category.  Within  the  family  of  univariate  continuous  models,  the  normal ,  gamma,  and  beta  distributions 
are  the  most  common  topics,  in  that  order. 

The  normal  distribution 

The  first  exact  method  for  generating  normal  variates  exactly,  given  by  Box  and  Milller  (1958),  yields 
pairs  of  independent  standard  normal  variates  using  r  =  (-21n(u,))'/2,  a  *  2ttu2,  x^  =  r  sin(a),  and 

X2  *  r  cos(o).  The  validity  of  the  algorithm  can  be  shown  directly  via  change  of  variables.  A  more 

intuitive  explanation  is  to  note  that  (r,  a)  are  the  polar  coordinates  of  (x.,  x_).  If  X,  and  X-  a-e 

Independent  standardized  normal  random  variables,  the  bivariate  density  functionals  symmetric  about  the 

origin.  Implying  that  u  is  a  U(0,  2ir)  variate,  and  implying  that  the  squared  distance  from  the  orig'n 

r2  =  x.z+X22  has  a  chi-square  distribution  with  two  degrees  of  freedom.  Noting  that  this  chi-square 

distribution  is  the  exponential  distribution  with  mean  2  yields  the  algorithm  from  the  point  of  view  of 

necessary  conditions  for  Xj  and  x2  to  be  independent  standardized  normal  variates.  • 

Marsaglia  and  Bray  (1964)  mention  an  Improvement  to  the  Box-Muller  algorithm  which  was  developed  in  I 

Marsaglia  (1962)  and  based  on  the  trigonometric  results  descussed  in  Section  2.4.  Noting  also  that  J 

v.z+v22  n,  U(0,  1)  conditional  on  v,z+v22  £  1  yields  the  algorithm  (1)  generate  (vj,  V2)  uniformly  1 

distributed  over  the  circle  with  urtlt  radius  centered  on  the  origin,  (2)  set  s  *  v-j2+v22,  (3)  set 
c  «  (-2  MsJ/s)1*,  (4)  set  x1  =  c  v1  and  (5)  set  x2  «  c  v2> 

While  these  two  early  algorithms  are  based  on  special  properties  of  the  normal  distribution,  later 

algorithms  have  been  primarily  composition  and  acceptance/ rejection  based.  At  the  assembler  language 

level,  where  bit  manipulation  Is  easy,  the  composition  based  algorithm  of  flarsaglia,  MacLaren  and  Bray 

(1964)  is  very  fast.  Not  as  fast,  but  requiring  no  bit  manipulation,  is  the  composition  algorithm  of 

Klnderman  and  Ramage  (1976).  The  are  many  algorithms  which  are  easy  to  Implement,  but  not  as  fast.  i 


Marsaglia  (1961c,  1964),  Klnderman  and  Monahan  (1976)  and  Schmeiser  (1980)  present  algorithms  for  random 
variates  from  the  tails  of  the  distribution.  Tail  variates  may  also  be  obtained  using  the  inverse  trans¬ 
formation,  which  Is  considered  In  Abramowitz  and  Stegun  (1964),  Beasley  and  Springer  (1977),  Burr  (1967), 
Hill  and  Davis  (1973),  Muller  (195B),  Odeh  and  Evans  (1974),  Page  (1977),  Ramberg  and  Schmeiser  (1972), 
Schmeiser  (1980),  and  Wetherlll  (1965). 

Other  references  on  normal  variate  generation  Include  Ahrens  and  Dieter  (1972,  1973),  Bell  (1968),  Best 
(1979),  Brent  (1974),  Burford  and  Willis  (1978),  Butcher  (1961),  Chay,  Fardo  and  Mazumdar  (1975),  Chen 
(1971),  Dieter  and  Ahrens  (1973),  Forsythe  (1972),  Gates  (1978),  Gebhardt  (1964),  George  (1976),  Klnderman 
and  Monahan  (1977),  Klnderman,  Monahan  and  Ramage  (1975),  Kronmal  (1964),  Marsaglia  (1961c),  Marsaglia, 
Ananthanarayanan  and  Paul  (1976),  Miklich  and  Austin  (1976),  Morltsas  (1973),  Milller  (1959b),  Payne 
(1977),  Pike  (1965),  Pullln  (1980),  Sakasegawa  (1978),  Shafer  (1962),  Shepherd  and  Hynes  (1976),  Sibuya 
(1962),  Swlck  (1974),  Tadikamalla  (1978c),  Tadlkamalla  and  Johnson  (1977),  and  C.S.  Wallace  (1976). 

The  state  of  the  art  of  normal  variate  generation  is  very  good.  No  matter  what  criteria  are  applicable, 
there  are  algorithms  which  are  satisfactory.  This  is  not  surprising  since  the  normal  distribution  has 
only  one  shape,  thereby  allowing  variates  to  be  generated  with  no  overhead  for  setting-up  constants. 

The  simple  transformation  of  multiplying  by  the  standard  deviation  and  adding  the  mean  yields  all  possible 
normal  distributions.  Gamma  and  beta  variate  generation  are  more  difficult  because  the  shape  of  the 
distribution  changes  ai  a  function  of  the  parameters. 

The  gamma  distribution  , 

the  gamma  distribution  with  shape  parameter  a>0  has  density  function  f(x)*x“  e  /r(a)  I(x)j0>a>j. 

Multiplying  by  the  scale  parameter  8>0  yields  a  mean  of  aS  and  variance  aB*.  Several  other  distribu¬ 
tions  are  special  cases:  The  exponential  with  mean  S  when  a«l,  the  Erlang  when  a  is  Integer,  the  chi- 
square  with  n  degrees  of  freedom  when  a«n/2  and  B»2,  and  the  normal  In  the  limit  as  a  *  ».  We  discuss 
the  exponential.  Erlang  and  chi-square  distributions  before  we  consider  the  general  gamma  distribution. 

The  classic  method  of  generating  exponential  variates  Is  the  Inverse  transformation  x  *  -  8  1n(1-u). 

Other  methods  Include  the  rectangle,  wedge,  tall  algorithm  of  MacLaren,  Marsaglia  and  Bray  (1964),  the 
comparison  method  of  von  Neumann  (1951)  discussed  In  Section  2.4,  modifications  to  the  comparison  method 
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by  Ahrens  end  Dieter  (1972),  Marsaglia  (1961a)  with  a  modification  by  Sfbuyu  (1962),  and  polynomial  sam¬ 
pling  in  Ahrens  and  Dieter  (1972).  The  Monte  Carlo  results  in  Ahrens  and  Dieter  (1972)  show  their  algo¬ 
rithm  SA  to  be  the  fastest  available  In  assembler  language  and  the  Inverse  transformation  to  be  the 
fastest  In  FORTRAN.  This  author.  In  unpublished  Monte  Carlo  results,  found  a  slightly  faster  FORTRAN 
level  algorithm  on  a  CDC  CYBER  72  in  1978  to  be  (1)  set  y  *  -  ln(u,u2),  (2)  set  x,  ■  u,y  and  (3)  set 
x.  *  y  -  x. .  Here  u-  partitions  the  Erlang  (with  mean  2)  variate  y  Into  two  independent  exponential 
variates,  'in  terms  of  computational  comparison  to  the  inverse  transformation.  It  trades  a  U{0,  1) 
generation  for  a  logarithm  computation.  With  the  additional  overhead  of  the  pointers  necessary  to  keep 
track  of  the  two  exponential  variates,  this  new  algorithm  Is  about  10%  faster  than  the  Inverse  transfor¬ 
mation.  A  more  general  algorithm  studied  was  to  partition  an  Erlang  (with  mean  k)  variate  y  by  k-1 
U(0,  1)  order  statistics  to  obtain  k  Independent  exponential  variates  with  mean  1,  but  k«2  proved  to 
be  the  fastest  and  easiest  to  Implement. 

Erlang  variates  with  mean  k  have  classically  been  generated  using  the  special  property  that  the  sum  of 
k  exponential  variates  have  the  desired  distribution.  Using  the  Inverse  transformation  and  some  algebra 
yields  x  *  -  ln(Hf  u.).  This  Is  an  excellent  algorithm  for  small  values  of  k,  but  execution  time  grows 
linearly  with  k,  making  the  use  of  the  more  general  gamma  algorithms  discussed  below  faster  for  large  k. 

2 

The  classical  method  of  generating  chi-square  random  variates  with  n  degrees  of  freedom  has  been  x  «  y+z 
where  y  Is  an  Erlang  variate  with  k  the  largest  Integer  less  than  or  equal  to  n/2  and  z  is  standard 
normal  If  n  Is  odd  and  Is  zero  if  n  Is  even.  The  special  case  of  n=2  Is  the  exponential  distribution 
with  mean  2.  For  large  values  of  n,  the  general  algorithms  for  the  gamma  distribution  are  faster. 

The  earliest  exact  method  for  generating  a  gamma  variate  for  any  a  >  0  is  due  to  Jbhnk  (1964),  which  is 
written  In  German.  Fishman  (1973)  discusses  the  algorithm,  which  Is  x  =  y  ♦  wz,  where  y  Is  an  Erlang  k 
variate,  w  is  an  exponential  variate  with  mean  1,  and  z  is  a  seta  variate  with  parameters  y  and  1-y, 
where  k  Is  the  Integer  portion  of  a, and  y  Is  the  fractional  portion.  Again  the  dependence  on  Erlang 
variates  makes  this  algorithm  Inefficient  for  large  values  of  a,  making  the  general  algorithms  discussed 
below  faster. 

As  late  as  the  mid  197Q's,  approximate  algorithms  where  being  published,  since  exact  methods  were 
unacceptably  slow  for  large  values  of  a.  These  Include  Phillips  ( 1 97 1 ) »  Phillips  and  Beightler  (1972), 
Ramberg  and  Schmeiser  (1974),  P.amberg  and  Tadikamalla  (1974),  and  Wheeler  (1974,  1975).  See  also  Bowman 
and  Beauchamp  (1975).  All  are  approximations  to  the  inverse  cdf  and  should  not  be  considered  In  light 
of  the  current  state  of  the  art.  Approximations  yielding  machine  accuracy  Inverse  transformations  may 
be  found  In  Best  and  Roberts  (1975)  and  Bhattacharjee  (1970).  Since  the  evaluation  of  the  Inverse 
transformation  Is  usually  performed  by  Iteratively  evaluating  the  cdf,  Gautschl  (1979)  is  of  Interest. 

See  also  Narula  and  Li  (1977). 

Exact  algorithms  which  execute  In  time  relatively  Insensitive  to  a  are  now  plentiful.  Schmeiser  and  Lai 
(1980)  give  algorithm  G4PE  which  has  the  smallest  execution  time  per  variate  for  large  values  of  a, 
but  Its  set-up  time  makes  It  not  fastest  when  only  one  variate  is  needed.  Best  (1978b)  gives  a  simple 
algorithm  with  almost  no  set-up  time.  There  are  many  algorithms  which  provide  a  continuum  In  tradeoff 
between  set-up  time  and  marginal  execution  time  between  these  two  algorithms.  When  a  very  fast  normal 
generator  Is  available,  Marsaglla's  (1977)  algorithm  RGAflA  is  very  fast.  Most,  but  not  all,  recent  algo¬ 
rithms  are  valid  for  a  >  1,  since  JShnk's  (1964)  algorithm  Is  quite  acceptable  for  a  <  1. 

Other  references  Include  Ahrens  and  Dieter  (1974),  Atkinson  (1977),  Atkinson  and  Pearce  (1976),  Cheng 
(1977),  Cheng  and  Feast  (1979),  Oagpunar  (1978),  Dieter  and  Ahrens  (1974),  Fishman  (1976),  Franklin  and 
Sen  (1975),  Greenwood  (1974),  Klnderman  and  Monahan  (1978),  McGrath  and  Irving  (1973),  Popescu  (1974), 
Tadikamalla  (1978a,  1978b),  C.S.  Wallace  (1976),  N.D.  Wallace  (1974),  Whittaker  (1974),  Berman  (1971), 
and  Locks  (1976).  Takahashl  (1959),  In  Japanese,  may  also  be  of  Interest. 

The  beta  distribution 

The  beta  distribution  with  shape  parameters  p  >  0  and  q  >  0  has  density  function 
f(x)  ■  xp'1(l-x)q*1/  0(p,  q)  Ux)<0>  ]). 

where  B(p,  q)  is  the  beta  function.  The  mean  Is  p/(p+q)  and  the  variance  Is  pq/((p+q)2(p+q+l)). 

Special  cases  Include  the  uniform  distribution  when  p  =  q  ■  1,  the  arcsln  distribution  when  p  *  q  ■  >i, 

the  gamma  distribution  in  the  limit  as  p  ♦  «,  q'-»  «,  and  p/q  remains  constant;  and  the  normal  distribution 
In  the  limit  as  p  -*  ®,  q  ♦  *>,  and  p  =  q.  When  p  and  q  are  both  less  than  1,  the  density  function  is  U 
shaped,  with  the  density  function  Infinite  at  x  *  0  and  x  »  1.  When  exactly  one  of  p  and  q  are  less  than 
1,  the  distribution  is  J  shaped,  and  when  both  p  and  q  are  greater  than  1,  the  distribution  Is  unimodal. 
This  diversity  of  shapes  makes  the  beta  distribution  an  important  model  of  real  world  phenomena  (often 
after  rescaling  to  the  interval  (a,  b)),  but  this  same  diversity  makes  developmentof  beta  variate  genera¬ 
tion  aigortihms  difficult.  Most  algorithms  consider  only  one  shape  of  the  beta  distribution,  requiring 
the  use  of  a  combination  of  algorithms  to  obtain  variates  efficiently  for  all  parameter  values. 

As  with  gamma  generation,  early  algorithms  dealt  with  special  cases.  Fox  (1963)  suggested  the  use  of 

U(0,  1)  order  statistics  when  p  and  q  are  Integer.  A  classical  general  technique  Is  x  *  w/(w+y)  where 

n  \  gamma  (a*p)  and  y  n,  gamma  (a«q),  which  results  In  a  reasonable  algorithm  when  good  gamma  generators 


are  used.  JBhnk  (1964)  gave  an  algorithm  valid  for  any  parameter  values,  tut  which  has  execution  times 
which  grow  rapidly  with  p  and/or  q. 

• 

Interest  In  beta  variate  generation  was  spurred  by  Ahrens  and  Dieter  ( 1974a)  who  used  a  normal  majorizing 
function  with  mean  p/(p+q)  truncated  at  zero  and  one  to  obtain  algorithm  BN.  Execution  time  Is  least 
when  p  and  q  are  close  to  the  limiting  normal  case  of  large  and  equal  values.  Execution  time  in  the 
limiting  case  as  the  beta  approaches  the  gamma  (p  and  q  large  and  unequal)  Is  asymptotically  infinite 
since  the  heavier  tails  of  the  gamma  distribution  force  a  poor  fit  by  the  normal  majorizing  function. 

The  first  algorithm  which  executes  in  finite  time  for  all  paiameter  values  p  >  1  and  q  >  1  is  BB,  which 
is  developed  in  Cheng  (1978).  Algorithm  B4PE  developed  in  Schmeiser  (1980)  has  marginal  times  about  half 
of  those  of  BB,  but  the  set-up  time  is  longer  and  B4PE  requires  more  lines  of  code. 

Atkinson  and  Whittaker  (1976,  1979)  consider  J  shaped  beta  distributions  having  one  parameter  less  than 
and  one  parameter  greater  than  1 . 

Other  references  are  Arnason  (1972),  Atkinson  (1979c),  Bankdvi  (1964),  BSkdssey  (1964),  Best  (1978a), 
Dieter  and  Ahrens  (1974),  Locke  (1976),  and  Schmeiser  and  Shalaby  (1980).  hajumder  and  Bhattacharjee 
(1973)  consider  the  inverse  transformation. 

Other  continuous  distributions 

6ther  continuous  univariate  distributions  have  received  considerably  less  attention.  Often  only  a  single 
paper  has  been  written  for  a  particular  distribution.  Ue  slnply  list  the  relevant  references  here. 

Inverse  Gaussian  (Wald)  distribution:  Michael,  Schucany  and  Haas  (1976). 
von  Mises  distribution:  Best  and  Fisher  (1979). 

Ansari-Bradley  W  statistic:  Dinneen  and  Blakesley  (1976). 

Weibull  distribution:  L6ger  (1973). 

Exponential  power  distribution:  Johnson  (1979)  and  Tadikamalla  (1980). 

Stable  distribution:  Bartels  (1978)  and  Chambers,  Mallows  and  Stuck  (1976). 

Lognormal  distribution:  Chamayou  (1976). 

Student's  t  distribution:  Kinderman  and  Monahan  (1978),  Kinderman,  Monahan  and  Ramage  (1975,  1977)  and 
Pearson  family:  Cooper,  Davis  and  Done  (1965)  and  McGrath  and  Irving  (1973).  Best  (1978a). 

Dipole  distribution  (a  generalization  of  the  Cauchy):  Knop  (1973). 

Cauchy  distribution:  Arnason  (1974),  Monahan  (1979),  and  Robinscn  and  Lewis  (1975). 

Kolmog orov-Smlrnov  statistic:  Devroye  (1980c). 

Burr  and  Pareto  distributions:  Popescu  (1977). 

Extreme  value  distribution:  Goldstein  (1963). 

Generalized  (four  parameter)  gamma  distribution:  Tadikamalla  (1979a). 

Weibull,  normal,  gamma  and  beta  tails:  Schmeiser  (1980) 

Devroye  (1980a)  considers  variate  generation  when  only  the  characteristic  function  is  known. 

Ramberg  (1975),  Ramberg  and  Schmeiser  (1972,  1974),  Burr  (1942,  1973),  N.L.  Johnson  (1947),  Ramberg, 
Tadikamalla,  Dudewicz  and  Mykytka  (1979),  Johnson,  Tietjen  and  Beckman  (1980),  Schmeiser  (1977),  and 
Schmeiser  and  Deutsch  (1977)  discuss  various  general  families  of  distributions  for  which  variate 
generation  is  straightforward. 


4.2  Univariate  Discrete  Distributions 

The  Poisson  and  binomial  distributions  have  received  the  most  attention  of  all  the  univariate  discrete 
distributions.  Johnson  (1974)  dovelopes  a  unifying  theory  for  discrete  variate  generation. 

The  Poisson  distribution 

The  Poisson  distribution  has  mass  function  f(x)  *  e'v  p*/x!  for  x  *  0,  1,  2,  ...  ,  where  p  is  the  mean 
and  the  variance  of  X. 

There  are  three  approaches  appropriate  when  p  Is  small.  The  Inverse  transformation,  Implemented  using 
the  recursion  f (x )  *  f(x-l)  p/x  almost  always  dominates  the  equally  easy  to  implement  algorithm  based 
on  simulating  a  homogeneous  Poisson  process  with  rate  1  for  p  time  units,  which  is  discussed  in  Kahn 
(1956)  and  Schaffer  (1970).  Both  methods  require  a  set-up  involving  exp(-p).  When  the  mean  changes 
often,  a  "thinning"  algorithm  which  requires  no  set-up  is  faster.  To  generate  variates  for  changing  mean 
in  the  range  (0,  y) ,  generate  a  Poisson  variate  with  mean  y  and  reject  each  event  with  probability 
1-(p/y).  which  is  equivalent  to  using  the  product  of  a  Poisson  variate  y  with  mean  y  and  a  binomial 
variate  with  parameters  n»y  and  p=y/p.  Thinning  algorithms  are  further  discussed  in  the  section  on  point 
processes. 

Generating  Poisson  variates  when  p  is  large  has  posed  a  more  substantial  problem  over  the  years.  Ahrens 
and  Dieter  (1974a)  give  a  composition  algorithm  with  execution  time  increasing  with  iii  and  a  method  based 
on  gamma  variates  with  time  increasing  in  1 n(y).  Fishman  (1976b)  surveys  the  Poisson  variate  literature 
and  gives  algorithm  P1F  which  sets-up  quickly  and  has  low  marginal  execution  times  for  moderate  values  of 
p,  although  execution  time  Increases  with  /p. 
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Akinson  (1979a)  gives  the  first  algorithm  which  Is  exact  and  has  execution  time  which  does  not  go  to 
Infinity  as  u  ■*  «.  Algorithm  PA  is  based  on  acceptance/rejection,  used  a  logistic  majorizing  function, 
and  evaluates  ln(x!)  by  tabling  values  for  x  through  200.  Using  Stirling's  approximation  for  ln(xl) 
with  enough  terms  to  provide  machine  accuracy,  rather  than  the  tabled  values, allows  PA  to  be  used  for 
any  large  value  of  p. 

Oevroye  (1980e)  gives  algorithm  IP  which  is  based  on  composition.  The  inverse  transformation  is  used 
for  the  left  tail.  The  body  of  the  distribution  is  handled  via  acceptance/rejection  and  a  normal 
|  majorizing  function.  The  right  tail  is  handled  with  an  exponertial  majorizing  function.  Evaluation  of 

I  x!  is  performed  explicitly  via  x(x-l )(x-2) • • • (2) ,  but  is  seldon  necessary  due  to  the  use  of  preliminary 

acceptance  and  rejection  comparisons.  Execution  times  are  very  stable  as  p  ♦ 

Schmeiser  and  Kachlvichyanukul  (1980)  give  algorithm  P2PE  using  composition.  Each  of  three  subdensities 
are  handled  via  acceptance/rejection.  The  tails  have  exponential  majorizing  functions  and  the  body  of 
the  distribution  has  a  uniform  majorizing  function.  Using  the  <inderman  and  Ramage  (1976)  normal 
generator  with  IP,  P2PE  requires  about  half  the  marginal  execution  time  and  PA  is  about  half  again  slower. 
!  For  one  variate,  IP  and  P2PE  require  about  the  same  time,  due  to  P2PE  taking  longer  to  set-up.  For  more 

than  one  variate,  P2PE  Is  preferred.  However,  if  a  very  fast  assembler  language  normal  generator  Is  used, 
|  IP  will  perform  better  than  with  the  FORTRAN  level  normal  generator. 

Other  Poisson  references  Include  Atkinson  (1979b),  Bolshev  (1965),  Kufnagel  and  Kerr  (1969),  Molenaar 
!  (1970),  Pak  (1975),  Snow  (1968),  and  Tadikamalla  (1979b). 

I  The  binomial  distribution 

The  binomial  distribution  has  mass  function  f(x)  *  (J,)  p  ( l-p)n-x  for  x  *  0,  1,  ...,  n.  The  mean  Is  np 
and  the  variance  is  np(l-p). 

When  np  is  small,  the  inverse  transformation  with  recursion  f(x)  =  f(x-l)  (n-x+1)  (p/(l-p))  /  x  Is  good. 
When  n  Is  small,  summing  n  Bernoulli  trials  each  having  probability  of  success  p  works  well. 

1  For  moderate  values  of  n,  the  use  of  Chen  and  Asau's  (1974)  index  table  for  searching  the  inverse  cdf 

is  fast,  but  as  n  goes  to  infinity,  either  the  size  of  the  table  or  the  execution  time  becomes  infinite, 
as  does  the  set-up  time.  Similarly  for  Walker's  (1977)  alias  method.  Norman  and  Cannon's  (1972)  tabling 
procedure  works  well  If  rounding  the  probabilities  Is  acceptable. 

Relies  (1972)  and  Ahrens  and  Dieter  (1974a)  give  algorithms  whose  execution  times  Increase  only  slowly 
with  the  mean,  based  on  the  binomial  distribution's  relationship  with  the  beta  distribution. 

There  are  two  exact  algorithms  which  have  finite  execution  time  as  n  and  np  go  to  Infinity.  Fishman 
(1979)  suggests  using  an  acceptance/rejection  algorithm  with  a  Poisson  majorizing  function.  Using  any 
of  the  three  Poisson  algorithms  requiring  finite  time  yields  a  finite  time  binomial  generator.  The  other 
algorithm  is  Devroye  and  Naderisamani  (1980). 

9  The  negative  binomial  distribution  n+x-1  n  x 

The  negative  binomial  distribution  has  mass  function  f ( x )  «  (  x  )  pn  (1-p)  for  x  »  0,  1,  ...  .  The 
mean  is  n(l-p)/p  and  the  variance  is  n(l-p)/p2.  It  is  also  called  the  Pascal  distribution  when  n  Is 
integer,  in  which  case  it  can  be  viewed  as  the  sum  of  n  geometric  random  variables  with  probability  of 
success  p  and  x  is  the  number  of  failures  before  n  successes.  The  geometric  distribution  Is  the  special 
case  of  n=1. 

Geometric  random  variables  may  be  generated  directly  using  the  Inverse  transformation  x  -  Uln(l-u)/ 
ln(l-p)J,  where  [yj  denotes  the  largest  integer  less  than  or  equal  to  y.  Of  course  sunmlng  n  geometric 
variates  results  In  execution  times  which  Increase  linearly  with  n. 

As  suggested  by  Devroye  and  Naderisamani  (1980),  a  negative  binomial  variate  for  any  n  and  p  can  be 
generated  in  a  reasonable  amount  of  time  by  generating  a  gamma  (a=n,  6=( 1 -p) /p)  variate  y  and  then 
generating  a  Poisson  variate  x  with  mean  y,  which  Is  an  example  of  continuous  composition  for  a  discrete 
random  variable.  Ldger  (1973)  also  discusses  the  negative  binomial  distribution. 


4.3  MULTIVARIATE  DISTRIBUTIONS 


The  generation  of  random  vectors  (X,,  X-,  ....  X  )  having  specified  properties  Is  substantially  harder 
than  the  generation  of  univariate  rindom  variates.  The  marginal  distributions  of  the  X  's  need  to  be 
correct  while  at  the  same  time  some  form  of  dependence  between  the  variables  must  be  established. 
Schmeiser  and  Lai  (1980a)  survey  multivariate  input  models  for  simulation,  including  continuous  and 
discrete  random  vectors,  point  processes,  time  series,  and  order  statistics. 


Continuous  multivariate  distributions 

A  common  problem  is  to  need  to  have  the  marginal  distributions  and  dependence  structure  specified  by  the 
joint  density  function  f(xj,  X2,  ....  xn).  Composition,  acceptance/rejection,  and  conditional  dlstrlbu- 
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tlons  are  applicable,  the  first  two  being  straightforward  extensions  of  the  univariate  concepts.  The 
use  of  conditional  distributions  reduces  the  multivariate  problem  to  n  univariate  problems  by  using 
the  algorithm  (1)  generate  xt  from  fi(x),  (2)  generate  x3  from  f2(xo|xi),  (3)  generate  x3  from 

*2^ *  and  s0  on*  wh,1e  **  is  very  general,  the  use  of  conditional  distributions  is  often 
Intractable. 

Often  in  simulation  input  modeling,  however,  the  data  can  le  used  to  estimate  the  conditional  distribu¬ 
tions  directly,  making  generation  via  conditional  distributions  straightforward.  See  Kottas  and  Lau 
(1978),  Ellon  and  Fowkes  (1973),  and  Johnson  (1976). 

The  multivariate  normal  distribution  has  been  the  subject  of  more  papers  than  any  other  multivariate 
topic  considered  here:  Barr  and  Slezak  (1972),  Bedall  and  Zimmerman  (1976),  De4k  (1978,  1979a,  1979c), 
Hurst  and  Knop  ( 1 972)  i  Jansson  (1964),  Page  (1974),  Scheuer  and  Stoller  (1962)  and  Schmeiser  and  All 
(1978).  Franklin  (1965)  discusses  the  related  topic  of  Gaussian  processes. 

Several  authors  have  considered  various  multivariate  gamma  distributions.  Mitchell,  Paulson  and 
Beswick  (1977)  generate  bivariate  exponential  random  vectors  with  any  positive  correlation  and  some 
negative  correlations.  (The  paoer  says  that  any  correlation  between  -2.5  and  1  can  be  obtained,  but 
this  is  obviously  a  misprint.)  Ronning  (1977)  and  Prdkopa  and  Szdntai  (1978)  present  multivariate  gamma 
distributions  and  generation  methods  for  nonnegative  correlations.  Schmeiser  and  Lai  (1979)  give  a 
family  of  algorithms  for  bivariate  vectors  having  any  gamma  marginal  distributions  and  any  correlation 
consistent  with  the  marginal  distributions,  including  negative  correlations. 

Macomber  and  Myers  (1978)  consider  multivariate  beta  distributions.  Arnason  (1972)  considers  the 
Dirichlet  distribution,  which  has  all  beta  marginal  distributions. 

Chalmers  (1975)  and  Dempster,  Schatzoff  and  Wermuth  (1977)  generate  random  correlation  matrices. 

Chambers  (1970)  and  Smith  and  Hocking  (1972)  consider  generation  of  Hishart  matrices  and  Gleser  (1976) 
generates  noncentral  Wishart  distributions.  Odell  and  Feiveson  (1966)  generate  sample  covariance 
matrices. 

Coleman  and  Saipe  (1973),  Gargano  and  Tenenbein  (1977),  Johnson  and  Ramberg  (1977b),  and  Johnson  and 
Tenenbeln  (1979)  discuss  bivariate  distributions  having  U(0,  1)  marginal  distributions.  Multivariate 
uniform  distributions  are  important  primarily  because  (Fs‘'(u,),  F-'^Up),  ....  Fn''(un))  has  exactly 
the  specified  marginal  distributions  and  by  modifying  the  correlrtTon  structure  of  thenun1form  random 
vector,  various  correlation  structures  can  be  obtained  in  the  multivariate  distribution  of  interest. 

The  major  problem  with  this  approach  is  that  the  correlation  between  x.  and  x.  must  be  determined  via 
numerical  integration.  1  3 

Although  not  in  the  context  of  random  variate  generation,  Kimeldorf  and  Sampson  (1975a,  1975b)  provide 
the  basis  for  a  wide  range  of  multivariate  uniform  distributions.  They  advocate  the  study  of  multivariate 
distributions  via  the  distribution  of  ( F, ( X, ) ,  Fp(X2),  ....  F  (X  )).  Since  each  F.(X.)  has  a  U ( 0 ,  1) 
distribution,  analysis  of  the  correlation  structure's  easiernaf?er  this  transformitiJn.  This  suggests 
an  algorithm  of  the  following  type:  (1)  generate  (z,,  z_,  .  ..z  )  from  any  n-dimensional  multivariate 
distribution  (the  multivariate  normal  being  the  obvious  choice),  and  (2)  calculate  x^  =  F^-1(4(z^)) 

for  1  »  1,  2,  ....  n,  where  <!>(•)  denotes  the  cdf  of  the  normal  distribution.  Still  the  problem  remains 
that  the  correlation  between  x^  and  Xj  must  be  determined  via  numerical  integration. 

Hull  (1977)  uses  this  method  (although  there  is  no  Indication  that  he  was  influenced  by  Kimeldorf  and 
Sampson's  work)  to  approximate  the  correlation  by  matching  points  on  the  regression  curve  E(X1|x2). 
Johnson  (1976)  discusses  direct  transformation  from  one  multivariate  distribution  to  another. 

Mardia  (1970)  offers  a  good  discussion  of  bivariate  distributions.  Moran  (1967)  and  Whitt  (1976)  con¬ 
tain  good  discussions  of  the  correlations  which  are  theoretically  possible  for  given  marginal  distribu¬ 
tions.  Other  references  include  Arnold  (1967),  Friday  (1976),  Johnson  (1949),  Johnson  and  Ramberg 
(1977a),  McArdle  (1976)  and  Pearson  (1925). 

Discrete  multivariate  distributions 

Little  work  has  appeared  on  discrete  multivariate  distributions.  Fishman  (1978a)  and  Ho,  Gentle  and 
Kennedy  (1979)  discuss  the  multinomial  distribution.  Kemp  (1976)  and  Kemp  and  Loukas  (1978a,  1978b) 
consider  the  generation  of  bivariate  dicrete  distributions  in  general.  Boyett  (1979)  gives  an  algorithm 
for  generating  R  x  C  contingency  tah’es.  See  also  Wakimoto  (1976). 

Point  Processes 

Generation  cf  point  processes  is  most  commonly  encountered  when  providing  arrivals  of  customers  to  a 
system.  The  simplist  case  Is  that  of  independent  Poisson  arrivals  with  constant  rate  u,  which  is  most 
commonly  handled  by  generating  exponential  Interarrival  times  with  mean  1/p  and  adding  the  time  to  the 
time  of  the  last  arrival.  Complications  arise  when  the  interevent  times  are  not  exponential  or  the  rate 
varies  as  a  function  of  time  or  the  state  of  the  system. 

A  Poisson  point  process  with  rate  p(t)  which  varies  with  time  is  called  a  nonhomogeneous  Poisson  point 
process  (NHPP).  A  NHPP  can  be  generated  using  the  inverse  transformation,  composition,  acceptance/rejec¬ 
tion,  and  special  properties. 
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Cinlar  (1975)  gives  the  Inverse  transformation,  which  he  terms  the  time  scale  transformation.  Let 


•  A(ti-r  ti)  ■  ,  v(t)  dt* 

which  is  the  expected  number  of  Poisson  arrivals  between  times  t^  .  and  t,. 
next  arrival  T^,  conditional  on  the  time  of  the  last  arrival  t^,  is 


The  cdf  of  the  time  of  the 


'Wi  (t<)  “ 1  *  exp('A(tf-i*  v}- 

Since  T.  Is  a  continuous  random  variable,  FT  ^  U(0,1).  Setting  FT  , .  (t.lt.  ,)  ■  u  and 

1  'l  1^1-1  Wl  1  1*1 

solving  for  t.  yields  the  Inverse  transformation  algorithm,  which  for  many  simple  NHPP’s  Is  closed 
form.  For  example,  if  p(t)  =  2ct,  the  inverse  transformation  algorithm  is  t,  *  (t?  .-lnd-uj/c)^. 
Kaminsky  and  Rumpf  (1977)  also  discuss  the  inverse  tra ns  formal  ion.  1  1-1 


The  special  property  is  that  Poisson  processes,  like  Poisson  random  variables,  can  be  added.  Consider 
n  NHPP's  having  rate  functions  p . ( t ) .  for  i  *  1 ,  2 . n.  'hen  merging  the  events  from  the  n  Indepen¬ 

dent  processes  yields  a  NHPP  wi  Ln  rate  function  u(t)  *  y  {i-.) . 

The  acceptance/rejection  concept  In  the  context  of  NHPP's  is  commonly  termed  "thinning."  Here  events 
from  one  NHPP  are  accepted  or  rejected  to  obtain  events  from  another  NHPP.  Let  p’(t)  >_  p(t),  where 
u'(t)  Is  chosen  so  that  the  inequality  is  close  and  events  from  the  NHPP  having  rate  function  p'(t)  are 
easy  and  fast  to  generate.  The  thinning  concept  is  to  generate  events  with  rate  p'(t)  and  to  accept 
each  event  with  probability  p( t ) /jj  1  ( t. ) ,  where  t  Is  the  time  of  the  event.  See  Lewis  and  Shedler  (1979b). 


Lewis  and  Shedler  (1976)  discuss  generating  events  when  p(t)  -  exp  (p^  +  p^t)  and  Lewis  and  Shedler 
(1979a)  consider  p(t)  -  exp  (pg  +  p^t  +  p2t2)  for  NHPP's. 

Jacobs  and  Lewis  (1977)  and  Laurance  and  Lewis  (1977)  discuss  point  processes  having  correlated  expon¬ 
ential  Interevent  times.  Fishman  and  Kao  (1977)  discuss  parameter  estimation  and  generation  of  Inter¬ 
event  times  using  a  harmonic  function  to  model  the  expected  Interevent  time  conditional  on  t.  .  to 
obtain  nonhomogeneity.  They  also  consider  nonexponential  Interevent  times.  Kimbler,  Davis  And  Schmidt 
(1980)  consider  estimating  and  generating  point  processes  when  the  data  Is  in  the  form  of  counts  and  are 
nonPoisson. 


Time  series 

Time  series  having  normal  marginal  distributions  were  studied  by  Franklin  (1965).  Coleman  and  Salpe 
(1977)  note  a  correct  method  for  generating  time  series  having  lognormal  marginal  distributions.  Gaver, 
Lavenberg  and  Price  (1973),  Lawrance  and  Lewis  (1977,  1978),  Jacobs  and  Lewis  (1977)  and  Schmelser  and 
Lai  (1979)  consider  time  series  having  gamma  marginal  distributions.  Price  (1976)  and  Hoffman  (1979) 
generate  binary  time  series.  Fraker  and  Hippy  (1974),  Kaplan  and  Orr  (1976),  Nawathe  and  Rao  (1979), 
Polge,  Holliday  and  Bhagavan  (1C73)  and  Tagil  (1963)  consider  various  related  problems,  as  do  LI  and 
Hammond  (1975)  who  provide  some  additional  references. 

Order  statistics 

We  briefly  review  some  results  for  generating  order  statistics.  Schmelser  (1978a)  gives  a  complete 
survey. 

Let  X/.»  denote  the  1  th  largest  observation  from  a  sample  of  n  (not  necessarily  independent)  observa¬ 
tions.  Then  x^j  Is  the  i  th  order  statistic.  The  minimum  observation  is  *(i)»  thc  maximum  is  x^nj  and 

the  median  is  x,<  +i)/2)w^en  n  °dd.  Tl,e  nee<*  for  rar,dom  order  statistics  arises  in  many  contexts; 
reliability  Is  A '2ommon' example.  Clearly  the  direct  method  of  generating  x^  x^,  ....  x^  and  sorting 

Is  always  valid.  However,  when  n  is  large  or  not  all  order  statistics  are  needed,  considerable  savings 
are  possible  using  the  methods  discussed  here. 

First  consider  the  case  of  independent  U(0,  1)  random  variables  U^,  (i2>  ....  iin.  Schucany  (1972)  showed 

that  the  following  algorithm  is  valid  for  generating  the  order  statistics  directly  without  sorting: 

(1)  Generate  v1 ,  v2,  ....  vfl  Independent  U(0,  1). 

(2)  Set  u{n)  =  vjl/n 

(3)  Set  u(n-1)  *  u(n-1+l)  vHl1/(r"1)  1»  1,2 . • 

The  algorithm  can  be  terminated  after  k  iterations  to  obtain  only  the  top  k  order  statistics.  Execution 
time  Is  linear  In  k,  whereas  sorting  algorithm  times  increase  faster  than  linearly.  The  intuitive 
thought  behind  Schucany’s  algorithm  is  that  conditional  on  knowing  u(n.^+i)*  distribution  of  the 

remaining  n-1  order  statistics  Is  that  of  n-1  Independent  U(0,  u<  f+1«)  random  variables,  thus  permitting 
the  recursion.  Lurie  and  Hartley  (1972)  published  a  similar  '  *  1  algorithm,  the  difference  being 

that  they  generate  the  order  statistics  in  the  reverse  order: 

(l)  Generate  v1 ,  v2,  . ...  v  Independent  U(0,  1). 
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(2)  Set  u(l)  ■  1  -  v,1/n 

(3)  Set  u(1)  -  1  -  (1  -  u(1.1})  v11/‘n*<+1)  for  1  =  2.  3 . n. 

Lurie  and  Mason  (1973),  Mason  and  Lurie  (1973)  and  Rablnowltz  and  Berenson  (1974)  consider  these  Ideas 
further.  Ramberg  and  Tadlkamalla  (1978)  suggest  using  n,  beta  (1,  n-1+1)  to  bIIom  the  recursion  In 
either  algorithm  to  begin  anywhere,  rather  than  only  the'top  or  bottom.  (Note  the  relationship  to  Fox 
(1963)  who  used  the  same  relationship  to  generate  beta  variates.) 


These  algorithms  for  U(0,  1)  order  statistics  are  more  general  than  they  first  appear,  since 
•  Fc'(u/.,)  Is  a  val 


*0)  '  rX  '“(1)' 
random  variable  X. 


Id  method  for  obtaining  random  variates  for  the  1  th  order  statistic  for  any 
The  validity  follows  from  FjJ1  being  a  monotonic  function. 


Devroye  (1980d)  considers  the  case  of  u<  ,  when  n  Is  so  large  that  numerical  problems  make  the  use 
of  “(„)■  Vj’'n  impossible.  Schmeiser  1  '  (1978)  considers  the  generation  of  or  Xjflj  when  the 

observations  are  not  indentically  distributed,  but  Fy^  Is  available. 

*1 


In  other  cases,  some  kind  of  sorting  Is  required.  The  use  of  a  histogram  provides  an  approximate  sort 
In  time  proportional  to  the  number  of  observations.  Good  sorting  algorithms  require  execution  time 
proportional  to  n  ln(n),  although  for  small  samples  n2  sorts  are  reasonable.  When  only  some  of  the 
order  statistics  are  required,  the  partial  sorts  of  Chambers  (1971,  1977)  and  Floyd  and  Rlvest  (1975) 
are  useful . 


A  final  point  Is  that  when  order  statistics  are  being  generated,  the  use  of  exact  algorithms  for 
generating  x.  Is  Important.  An  insignificant  error  in  the  tail  of  the  distribution  under  regular  sam¬ 
pling  can  be’magnifled  Into  a  serious  problem  with  order  statistics,  since  extreme  observations  become 
more  likely. 

Geometric  problems 

Many  random  generation  problems  have  geometric  interpretations,  the  most  common  being  points  uniformly 
distributed  on  a  sphere  and  random  permutations  (card  shuffling). 


Muller  first  considered  the  generation  of  a  point  uniformly  distributed  on  an  n-dlmenslonal  sphere, 
let  Zj,  z2,  ....  zn  be  Independent  standardized  normal  random  variates.  Then  if  x^  =  z^2) 

for  1  ■  1,  2,  ....  n;  (x,,  Xj,  ....  xn)  Is  a  point  uniformly  distributed  on  the  n-dlmenslonal  sphere 
with  radius  one  centered  on  the  origin.  Execution  time  grows  linearly  with  n. 


Acceptance/rejection  from  an  n-dimensional  unit  cube  looks  appealllng  at  first,  but  the  ratio  of  the 
volume  of  the  sphere  to  the  cube  goes  to  zero  quickly  as  n  -*•  ».  See,  for  example,  Schmeiser  and  AH 
(1978). 

Other  references  Include  Cook  (1959),  DeSk  (1979b),  Hicks  and  Wheeling  (1959),  Marsaglia  (1972),  Slbuya 
(1964),  and  Yoshlhiro  (1977). 

Algorithms  for  generating  random  permutations  may  be  found  In  Boyett  (1979),  Elsen  (1964),  Page  (1967), 
and  Rao  (1961). 

Crain  (1978)  considers  generation  of  random  polygons  and  Hsuan  (1979)  generates  uniform  polygonal  random 
pairs.  Knop  (1970)  and  Schrack  (1972)  discuss  generation  of  random  vectors  distributed  over  a  solid 
angle.  Helberger  (1978)  considers  random  orthogonal  matrices. 


5.  SUMMARY 


The  state  of  the  art  of  random  variate  generation  has  changed  greatly  In  the  last  ten  years.  Fast,  exact 
and  easy  to  Implement  algorithms  are  available  for  most  common  univariate  distributions.  Order  statistics 
and  nonhomogeneous  Poisson  point  processes  are  much  more  tractable  than  they  were  a  few  years  ago.  Multi¬ 
variate  gamma  vectors  with  any  correlation  structure  can  now  be  generated,  although  as  with  many  multi¬ 
variate  generation  problems  numerical  integration  is  involved.  Several  families  of  distributions  which 
are  much  more  general  than  the  commonly  used  distributions  have  been  developed. 
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